Uyghur–Kazakh–Kirghiz Text Keyword Extraction Based on Morpheme Segmentation

نویسندگان

چکیده

In this study, based on a morpheme segmentation framework, we researched text keyword extraction method for Uyghur, Kazakh and Kirghiz languages, which have similar grammatical lexical structures. these affixes stem are joined together to form word. A is word particle with notional meaning, while the perform functions. Because of derivative properties, vocabularies used languages huge. Therefore, pre-processing necessary step in NLP tasks Kirghiz. Morpheme enabled us remove suffixes as auxiliary unit retaining meaningful it reduced dimension feature space present task texts. We transformed into problem labeling sequences, Bi-LSTM network bidirectionally obtain position information character sequences. applied CRF effectively learn preceding following label sequences build highly accurate Bi-LSTM_CRF model, prepared morpheme-based experimental sets by using model. Subsequently, vectors’ similarity modify TextRank algorithm, subsequent training embedding vector Doc2vec then performed experiment. experiment, highest F1 scores 43.8%, 44% 43.9% were obtained three datasets. The results show that approach provides much better than word-based approach, shows weighting an efficient task, thus proving efficiency sequence morphologically languages.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Keyword Extraction From Chinese Text Based On Multidimensional Weighted Features

This paper proposed to solve the problems of incomplete coverage and low accuracy in keyword extraction of Chinese text based on intrinsic feature of the Chinese language and an extraction method of multidimensional information weighted eigenvalues. This method combined theoretical analysis and experimental calculation to study the parts of speech, word position, word length, semantic similarit...

متن کامل

Keyword Extraction for Text Characterization

Keywords are valuable means for characterizing texts. In order to extract keywords we propose an efficient and robust, language-and domain-independent approach which is based on small word parts (quadgrams). The basic algorithm can be improved by reexamining and re-ranking keywords using edit distance (i.e. Levenshtein distance) and an algorithm based on the relativistic addition of velocities ...

متن کامل

EXTRACTION-BASED TEXT SUMMARIZATION USING FUZZY ANALYSIS

Due to the explosive growth of the world-wide web, automatictext summarization has become an essential tool for web users. In this paperwe present a novel approach for creating text summaries. Using fuzzy logicand word-net, our model extracts the most relevant sentences from an originaldocument. The approach utilizes fuzzy measures and inference on theextracted textual information from the docu...

متن کامل

Keyword Extraction Based on Implicit Feedback

To improve the results from search engines and make them more personalized for the user, we need to find out about the interests of a particular user. Many of the search personalization methods analyse documents visited by the user and from these documents infer the user’s interests. However, this approach is not accurate, because the user is rarely interested in the whole document; he might be...

متن کامل

Image Segmentation for Text Extraction

This paper presents a methodology for extracting text from images such as document images, scene images etc. Text that appears in these images contains important and useful information. Text extraction in images has been used in large variety of applications such as mobile robot navigation, document retrieving, object identification, vehicle license plate detection, etc. In this paper, we emplo...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Information

سال: 2023

ISSN: ['2078-2489']

DOI: https://doi.org/10.3390/info14050283